Deep learning-based prediction of post-pancreaticoduodenectomy pancreatic fistula

Postoperative pancreatic fistula is a life-threatening complication with an unmet need for accurate prediction. This study was aimed to develop preoperative artificial intelligence-based prediction models. Patients who underwent pancreaticoduodenectomy were enrolled and stratified into model development and validation sets by surgery between 2016 and 2017 or in 2018, respectively. Machine learning models based on clinical and body composition data, and deep learning models based on computed tomographic data, were developed, combined by ensemble voting, and final models were selected comparison with earlier model. Among the 1333 participants (training, n = 881; test, n = 452), postoperative pancreatic fistula occurred in 421 (47.8%) and 134 (31.8%) and clinically relevant postoperative pancreatic fistula occurred in 59 (6.7%) and 27 (6.0%) participants in the training and test datasets, respectively. In the test dataset, the area under the receiver operating curve [AUC (95% confidence interval)] of the selected preoperative model for predicting all and clinically relevant postoperative pancreatic fistula was 0.75 (0.71–0.80) and 0.68 (0.58–0.78). The ensemble model showed better predictive performance than the individual ML and DL models.


CT techniques
All CT images were obtained using 16-detector or higher CT systems.For contrast enhancement, the total volume of non-ionic iodinated contrast medium was stratified according to the patient's body weight (approximate rate, 2 mL/kg; maximum rate,150 mL), and an automatic power injector was used to deliver the contrast agent intravenously (3 mL/s).Portal venous phase (PVP) images were obtained at 70-75 seconds after contrast injection.Images were reconstructed using filtered back projection (B30f, B30s, B41f, B41s) or iterative reconstruction (I30s, I30f).Pixel size ranged from 0.53 to 1.11 mm.

Body composition analysis
A single PVP axial CT image at the level of the lower endplate of the 3rd lumbar vertebra was used. 1,2he cross-sectional areas of the total abdominal wall muscle (skeletal muscle area; including psoas, paraspinal, transversus abdominis, rectus abdominis, quadratus lumborum, and internal and external obliques), subcutaneous adipose tissue, and visceral adipose tissue were measured with preestablished thresholds (from −29 to +150 Hounsfield unit [HU] for skeletal muscle area and from −190 to −30 HU for subcutaneous and visceral adipose tissues). 3The body composition parameters were normalized by being divided to the patient height squared (cm 2 /m 2 ) and reported as indices, including the visceral adipose tissue index (VATI), subcutaneous adipose tissue index (SATI), and skeletal muscle index (SMI).Skeletal muscle density (SMD), which represents the degree of myosteatosis, was quantified as the mean HU of the skeletal muscle area; the cutoff points for the presence of myosteatosis were set at 41 and 33 HU for non-overweight and overweight patients, respectively. 4

Machine learning models
The clinical information and body composition data extracted from the training dataset was used to develop machine learning models.In cases of missing values, the median value of each variable was imputed.6][7] ANN was trained by an Adam optimizer with a batch size of 257 and a learning rate of 4e-3.The training part of TabNet was performed using the Adam optimizer (learning rate, 0.1; batch size, 64).The binary cross-entropy loss function (i.e., average difference between the predicted and actual probabilities) was used for the ANN and TabNet.Linear LR with L2-regularization and Kernel SVM with Gaussian kernel were used.The number of trees in RF and GB was 100.All machine learning models, except for ANN and TabNet, were trained using the Scikit-learn library on Python 3.8. 6The deep learning library Keras 2.5 version was used for the development of the ANN and TabNet models. 7

Deep learning models
We developed two-dimensional (2D) convolutional neural network-based deep learning models.CT images underwent several pre-processing steps, including resampling, intensity normalization, augmentation, and cropping.All CT images were resampled to pixels of 0.5 × 0.5 mm 2 using spline interpolation to decrease the variability between scans. 8,9The image intensities were normalized from 0 to 1 by using the limit of lower and upper HU as −200 and 300, respectively.Image augmentation techniques, such as rotation, shearing, scaling, and modification of the image brightness, were applied to enhance the size of the training dataset. 10For data augmentation, we used rotation angles ranging from −5° to 5°, with an interval of 1° and shifting of brightness ranging from −0.1 to 0.1 (interval, 0.01).Moreover, scaling and shearing ratios of heights and widths ranging from 95% to 105% with an interval of 1% were utilized.Each data augmentation technique was applied on a 50-50 chance, and the parameters were randomly selected within a predefined range.As a final step of pre-processing, the region with 96 × 96 mm 2 centered at the pancreatic neck (predicted cut surface during pancreatidoduodenectomy) was cropped from the original images.
2][13][14] The first convolutional layers of deep convolutional neural networks were modified to have an input channel of 1.A dropout layer and a sigmoid layer were appended to the last fully connected layer of the networks.To reduce the interdependent network elements, the dropout layer randomly ignored the hidden layer nodes in the training process with a probability rate of 0.25.
To train the deep learning models, the training dataset was divided into the training subset (for model development) and the validation subset (for evaluation of models' performance with different hyperparameter values and for the detection of any overfitting that occurred during the training course).Patients who underwent surgery between 2016 and 2017 were randomly separated into the training (728 patients) and validation (153 patients) subsets.Models were learned by the Adam optimizer with a batch size of 32 and a learning rate of 1e-4.The loss function was binary crossentropy.A maximum epoch was 300; however, when the loss in the validation set did not decrease for 10 epochs, the training was aborted.The implementation of the models was conducted in Python 3.8 with Pytorch 1.8 with Nvidia GTX 2080 ti.
The gradient-guided class attention maps (Grad-CAM++) 15 overlaid with CT images were generated by averaging each attention map of the deep learning models included in the ensemble models. 16Two model values (ResNet and Inception v3) were averaged for predicting all POPF, and three model values (ResNet, DenseNet, and ResNeXt) were averaged for predicting CR-POPF.

Ensemble model
Ensemble learning was performed separately for making a preoperative model and a comprehensive model.Machine learning, deep learning, and prior models 17,18 were included by ensemble voting, by using the soft or hard voting method. 19In hard voting, the output of the ensemble was the proportion of models that predicted the class as positive (i.e., the probability predicted by a model was >0.5); however, in soft voting, the output of the ensemble was the average of probabilities predicted by each model.For each voting method, we searched all possible combinations with grid-search methods.The final ensemble of the models was chosen according to two conditions: (1) highest accuracy in the validation subset and (2) absolute difference <5% between the accuracies of the training and validation subsets to avoid overfitting.

table 4 . Characteristics of the patients with CR-POPF
Note-Unless otherwise indicated, data that are presented are means, with standard deviation in parentheses.Abbreviations: PDAC, pancreatic ductal adenocarcinoma; POD, postoperative day; CRP, C-reactive protein Supplementary